Building on the simulations in sec-simul-dataset, this section estimates the spectrum of fairness. It complements Section 4.2 of the main paper, linked from the supplement’s introduction. We recommend keeping the main paper nearby, as this supplement focuses exclusively on implementation.
Packages for this section
library(tidyverse)library(latex2exp)library(jsonlite)## also setup python for optimal transport. library(reticulate)get_python_env_path <-function(file_path ="python_env_path.txt") {if (!file.exists(file_path)) {stop("The file 'python_env_path.txt' is missing. Please create it and specify your Python environment path.") }# Read the file and extract the first non-comment, non-empty line env_path <-readLines(file_path, warn =FALSE) env_path <- env_path[!grepl("^#", env_path) &nzchar(env_path)]if (length(env_path) ==0) {stop("No valid Python environment path found in 'python_env_path.txt'. Please enter your path.") }return(trimws(env_path[1]))}tryCatch({ python_env_path <-get_python_env_path()use_python(python_env_path, required =TRUE)message("Python environment successfully set to: ", python_env_path)}, error =function(e) {stop(e$message)})
Data for this section
sims <-fromJSON('simuls/train_scenarios.json')valid <-fromJSON('simuls/valid_scenarios.json')test <-fromJSON('simuls/test_scenarios.json')## For the experiment in section 5sim_samples <- jsonlite::fromJSON('simuls/sim_study.json')
In this section, we provide an example of detailed methodology for estimating the benchmarks premiums from the spectrum of fairness defined in Section 4.1 of the main paper. It complements the estimation of Section 4.1.
In this paper, premiums are estimated using lightgbm algorithms from Ke et al. (2017), a package that supports common actuarial distributions (Tweedie, Poisson, Gamma, etc.) and was found to have excellent predictive performance compared to other boosting algorithm in Chevalier and Côté (2024). Given lightgbm’s flexibility, we advocate for careful variable preselection to:
Ensure comprehensive coverage of risk factors,
Reduce redundancy and potential multicollinearity while preserving predictive value.
We optimize hyperparameters with a validation set to prevent overfitting. Key hyperparameters in lightgbm are the number of trees (num_iterations), regularization (lambda_l1, lambda_l2) for complexity control, and subsampling percentages (feature_fraction, bagging_fraction) .
The data used to construct prices and the estimated spectrum should align to ensure comparability. In the Case Study of the main paper, we conduct a posteriori benchmarking, as the study period is historical. A posteriori assessment can reveal segments where the model misaligned from fairness pillars after the fact. In contrast, conducting benchmarking concurrently with the development of a new pricing model provides an estimate of how well the commercial price aligns with fairness pillars for future periods, though the true fairness implications of the commercial price will only become evident retrospectively.
Discretizing \(D\)
Subsequent steps involve numerous computations over all possible values of \(D\). When \(D\) is continuous, discretization can improve tractability. The discretized version should be used throughout this appendix, and the discretization function should be kept for future applications. For multivariate \(D\), one approach is to first discretize continuous components, then treat the vector as one categorical variable where each unique combination defines a category. Care should be taken to ensure a manageable number of categories, with sufficient representation in each.
Balancing benchmark premiums
Profit and expense loadings are important, but they must not reintroduce unfairness. To align premium benchmarks with expected profits, we multiply the premiums by a factor \(\delta_j \geq 1~\forall j \in \{B, U, A, H, C\}\) to globally balance all premiums to the level of the commercial price.
In what follows, we detail how we estimate each benchmark premium in our framework.
2.1 Best-estimate premium
The best estimate premium\(\widehat{\mu}^B(\mathbf{x}, d)\) serves as the anchor. Thus, its estimation is crucial for our framework. It provides the best predictor of the response variable \(Y\) when differentiating risks based on \((X, D)\). It can be derived from indicated rates or data-driven (what we do) estimates as detailed in Complement 5 of the main paper.
Training the data-driven best-estimate price
source('___lgb_best_estimate.R')## clean the pred repounlink(file.path('preds', "*_best_estimate.json"))# Define grid for hyperparameter optimization hyperparameter_grid <-expand.grid(learning_rate =c(0.01),feature_fraction =c(0.75),bagging_fraction =c(0.75),max_depth =c(5),lambda_l1 =c(0.5),lambda_l2 =c(0.5) )best_lgb <-setNames(nm =names(sims)) %>%lapply(function(name){ list_df <-list('train'= sims[[name]],'valid'= valid[[name]],'test'= test[[name]])the_best_estimate_lightgbm_fun(list_data = list_df, name = name,hyperparameter_grid = hyperparameter_grid)})
Best for scenario: Scenario1
hyperparam 1 / 1
Best valid mse: 52.35364
optimal ntree: 474
Training time: 7.954376 sec.
Best for scenario: Scenario2
hyperparam 1 / 1
Best valid mse: 52.10566
optimal ntree: 594
Training time: 9.149719 sec.
Best for scenario: Scenario3
hyperparam 1 / 1
Best valid mse: 52.87658
optimal ntree: 477
Training time: 7.268097 sec.
Training the best-estimate price on experimental data (for later use in section 5)
Best for scenario: sim1
hyperparam 1 / 1
Best valid mse: 60.33479
optimal ntree: 392
Training time: 4.949132 sec.
Best for scenario: sim2
hyperparam 1 / 1
Best valid mse: 50.91685
optimal ntree: 336
Training time: 5.07208 sec.
Best for scenario: sim3
hyperparam 1 / 1
Best valid mse: 51.20826
optimal ntree: 288
Training time: 4.947608 sec.
Best for scenario: sim4
hyperparam 1 / 1
Best valid mse: 56.32536
optimal ntree: 524
Training time: 10.76273 sec.
Best for scenario: sim5
hyperparam 1 / 1
Best valid mse: 54.42696
optimal ntree: 381
Training time: 7.418698 sec.
Best for scenario: sim6
hyperparam 1 / 1
Best valid mse: 53.72137
optimal ntree: 365
Training time: 7.3409 sec.
Best for scenario: sim7
hyperparam 1 / 1
Best valid mse: 60.79705
optimal ntree: 316
Training time: 6.016717 sec.
Best for scenario: sim8
hyperparam 1 / 1
Best valid mse: 59.78302
optimal ntree: 504
Training time: 7.785273 sec.
Best for scenario: sim9
hyperparam 1 / 1
Best valid mse: 53.2293
optimal ntree: 294
Training time: 5.121347 sec.
Best for scenario: sim10
hyperparam 1 / 1
Best valid mse: 60.77194
optimal ntree: 304
Training time: 4.905564 sec.
Best for scenario: sim11
hyperparam 1 / 1
Best valid mse: 49.90746
optimal ntree: 302
Training time: 5.065884 sec.
Best for scenario: sim12
hyperparam 1 / 1
Best valid mse: 50.79596
optimal ntree: 328
Training time: 5.070839 sec.
Best for scenario: sim13
hyperparam 1 / 1
Best valid mse: 51.64612
optimal ntree: 317
Training time: 6.545786 sec.
Best for scenario: sim14
hyperparam 1 / 1
Best valid mse: 60.85369
optimal ntree: 309
Training time: 8.622002 sec.
Best for scenario: sim15
hyperparam 1 / 1
Best valid mse: 57.2588
optimal ntree: 351
Training time: 9.125745 sec.
Best for scenario: sim16
hyperparam 1 / 1
Best valid mse: 57.91143
optimal ntree: 330
Training time: 12.01773 sec.
Best for scenario: sim17
hyperparam 1 / 1
Best valid mse: 55.78472
optimal ntree: 275
Training time: 11.12442 sec.
Best for scenario: sim18
hyperparam 1 / 1
Best valid mse: 58.69825
optimal ntree: 339
Training time: 9.822523 sec.
Best for scenario: sim19
hyperparam 1 / 1
Best valid mse: 58.80992
optimal ntree: 525
Training time: 12.27094 sec.
Best for scenario: sim20
hyperparam 1 / 1
Best valid mse: 49.75454
optimal ntree: 279
Training time: 8.071054 sec.
Best for scenario: sim21
hyperparam 1 / 1
Best valid mse: 50.79592
optimal ntree: 371
Training time: 11.29008 sec.
Best for scenario: sim22
hyperparam 1 / 1
Best valid mse: 51.96514
optimal ntree: 453
Training time: 13.96027 sec.
Best for scenario: sim23
hyperparam 1 / 1
Best valid mse: 55.38553
optimal ntree: 295
Training time: 12.30759 sec.
Best for scenario: sim24
hyperparam 1 / 1
Best valid mse: 51.9121
optimal ntree: 300
Training time: 11.14981 sec.
Best for scenario: sim25
hyperparam 1 / 1
Best valid mse: 56.23475
optimal ntree: 248
Training time: 11.48121 sec.
Best for scenario: sim26
hyperparam 1 / 1
Best valid mse: 51.90556
optimal ntree: 410
Training time: 14.10936 sec.
Best for scenario: sim27
hyperparam 1 / 1
Best valid mse: 58.62825
optimal ntree: 351
Training time: 11.06315 sec.
Best for scenario: sim28
hyperparam 1 / 1
Best valid mse: 51.38316
optimal ntree: 442
Training time: 11.50402 sec.
Best for scenario: sim29
hyperparam 1 / 1
Best valid mse: 57.27832
optimal ntree: 351
Training time: 10.31913 sec.
Best for scenario: sim30
hyperparam 1 / 1
Best valid mse: 65.73161
optimal ntree: 423
Training time: 11.42914 sec.
Best for scenario: sim31
hyperparam 1 / 1
Best valid mse: 52.43089
optimal ntree: 341
Training time: 11.5907 sec.
Best for scenario: sim32
hyperparam 1 / 1
Best valid mse: 48.98963
optimal ntree: 316
Training time: 8.861043 sec.
Best for scenario: sim33
hyperparam 1 / 1
Best valid mse: 57.39815
optimal ntree: 415
Training time: 9.765657 sec.
Best for scenario: sim34
hyperparam 1 / 1
Best valid mse: 51.84479
optimal ntree: 376
Training time: 12.60381 sec.
Best for scenario: sim35
hyperparam 1 / 1
Best valid mse: 59.83096
optimal ntree: 454
Training time: 12.87287 sec.
Best for scenario: sim36
hyperparam 1 / 1
Best valid mse: 53.5909
optimal ntree: 261
Training time: 8.490556 sec.
Best for scenario: sim37
hyperparam 1 / 1
Best valid mse: 55.13199
optimal ntree: 404
Training time: 13.977 sec.
Best for scenario: sim38
hyperparam 1 / 1
Best valid mse: 54.7921
optimal ntree: 286
Training time: 7.808697 sec.
Best for scenario: sim39
hyperparam 1 / 1
Best valid mse: 54.24569
optimal ntree: 284
Training time: 7.932821 sec.
Best for scenario: sim40
hyperparam 1 / 1
Best valid mse: 54.19071
optimal ntree: 352
Training time: 11.59254 sec.
Best for scenario: sim41
hyperparam 1 / 1
Best valid mse: 52.58791
optimal ntree: 329
Training time: 11.89847 sec.
Best for scenario: sim42
hyperparam 1 / 1
Best valid mse: 52.76784
optimal ntree: 397
Training time: 10.58336 sec.
Best for scenario: sim43
hyperparam 1 / 1
Best valid mse: 62.81313
optimal ntree: 391
Training time: 11.67223 sec.
Best for scenario: sim44
hyperparam 1 / 1
Best valid mse: 55.71322
optimal ntree: 395
Training time: 10.81585 sec.
Best for scenario: sim45
hyperparam 1 / 1
Best valid mse: 60.5693
optimal ntree: 354
Training time: 5.684685 sec.
Best for scenario: sim46
hyperparam 1 / 1
Best valid mse: 56.32787
optimal ntree: 367
Training time: 5.85745 sec.
Best for scenario: sim47
hyperparam 1 / 1
Best valid mse: 55.9285
optimal ntree: 382
Training time: 5.153053 sec.
Best for scenario: sim48
hyperparam 1 / 1
Best valid mse: 46.62527
optimal ntree: 349
Training time: 5.137555 sec.
Best for scenario: sim49
hyperparam 1 / 1
Best valid mse: 53.27347
optimal ntree: 278
Training time: 4.485639 sec.
Best for scenario: sim50
hyperparam 1 / 1
Best valid mse: 55.30913
optimal ntree: 404
Training time: 5.489235 sec.
Best for scenario: sim51
hyperparam 1 / 1
Best valid mse: 55.59591
optimal ntree: 260
Training time: 4.289922 sec.
Best for scenario: sim52
hyperparam 1 / 1
Best valid mse: 51.41044
optimal ntree: 277
Training time: 4.498037 sec.
Best for scenario: sim53
hyperparam 1 / 1
Best valid mse: 57.98406
optimal ntree: 475
Training time: 6.874064 sec.
Best for scenario: sim54
hyperparam 1 / 1
Best valid mse: 55.82739
optimal ntree: 357
Training time: 5.330815 sec.
Best for scenario: sim55
hyperparam 1 / 1
Best valid mse: 63.26344
optimal ntree: 358
Training time: 5.425838 sec.
Best for scenario: sim56
hyperparam 1 / 1
Best valid mse: 59.99558
optimal ntree: 319
Training time: 4.607765 sec.
Best for scenario: sim57
hyperparam 1 / 1
Best valid mse: 46.54049
optimal ntree: 293
Training time: 4.214027 sec.
Best for scenario: sim58
hyperparam 1 / 1
Best valid mse: 50.22334
optimal ntree: 363
Training time: 5.167843 sec.
Best for scenario: sim59
hyperparam 1 / 1
Best valid mse: 57.2901
optimal ntree: 365
Training time: 6.151565 sec.
Best for scenario: sim60
hyperparam 1 / 1
Best valid mse: 59.99731
optimal ntree: 390
Training time: 6.228045 sec.
Best for scenario: sim61
hyperparam 1 / 1
Best valid mse: 57.14639
optimal ntree: 459
Training time: 7.398513 sec.
Best for scenario: sim62
hyperparam 1 / 1
Best valid mse: 57.85747
optimal ntree: 315
Training time: 5.435315 sec.
Best for scenario: sim63
hyperparam 1 / 1
Best valid mse: 56.39258
optimal ntree: 312
Training time: 4.824038 sec.
Best for scenario: sim64
hyperparam 1 / 1
Best valid mse: 48.83371
optimal ntree: 332
Training time: 4.845143 sec.
Best for scenario: sim65
hyperparam 1 / 1
Best valid mse: 62.17314
optimal ntree: 524
Training time: 6.391987 sec.
Best for scenario: sim66
hyperparam 1 / 1
Best valid mse: 50.93092
optimal ntree: 403
Training time: 5.870877 sec.
Best for scenario: sim67
hyperparam 1 / 1
Best valid mse: 55.27313
optimal ntree: 367
Training time: 7.046694 sec.
Best for scenario: sim68
hyperparam 1 / 1
Best valid mse: 62.22941
optimal ntree: 375
Training time: 6.435948 sec.
Best for scenario: sim69
hyperparam 1 / 1
Best valid mse: 53.87731
optimal ntree: 348
Training time: 5.496983 sec.
Best for scenario: sim70
hyperparam 1 / 1
Best valid mse: 55.38108
optimal ntree: 299
Training time: 5.304265 sec.
Best for scenario: sim71
hyperparam 1 / 1
Best valid mse: 55.88654
optimal ntree: 407
Training time: 7.996598 sec.
Best for scenario: sim72
hyperparam 1 / 1
Best valid mse: 53.81861
optimal ntree: 299
Training time: 5.838899 sec.
Best for scenario: sim73
hyperparam 1 / 1
Best valid mse: 54.05302
optimal ntree: 352
Training time: 5.271713 sec.
Best for scenario: sim74
hyperparam 1 / 1
Best valid mse: 53.68097
optimal ntree: 417
Training time: 6.030551 sec.
Best for scenario: sim75
hyperparam 1 / 1
Best valid mse: 52.77939
optimal ntree: 367
Training time: 5.415834 sec.
Best for scenario: sim76
hyperparam 1 / 1
Best valid mse: 54.30186
optimal ntree: 450
Training time: 6.671667 sec.
Best for scenario: sim77
hyperparam 1 / 1
Best valid mse: 56.50446
optimal ntree: 390
Training time: 5.272126 sec.
Best for scenario: sim78
hyperparam 1 / 1
Best valid mse: 54.13193
optimal ntree: 323
Training time: 4.735928 sec.
Best for scenario: sim79
hyperparam 1 / 1
Best valid mse: 63.37444
optimal ntree: 601
Training time: 8.198827 sec.
Best for scenario: sim80
hyperparam 1 / 1
Best valid mse: 56.34832
optimal ntree: 448
Training time: 7.196346 sec.
Best for scenario: sim81
hyperparam 1 / 1
Best valid mse: 52.71023
optimal ntree: 397
Training time: 6.145195 sec.
Best for scenario: sim82
hyperparam 1 / 1
Best valid mse: 60.70158
optimal ntree: 356
Training time: 6.116415 sec.
Best for scenario: sim83
hyperparam 1 / 1
Best valid mse: 52.28889
optimal ntree: 281
Training time: 4.616353 sec.
Best for scenario: sim84
hyperparam 1 / 1
Best valid mse: 49.49985
optimal ntree: 330
Training time: 4.89299 sec.
Best for scenario: sim85
hyperparam 1 / 1
Best valid mse: 56.55887
optimal ntree: 280
Training time: 4.383255 sec.
Best for scenario: sim86
hyperparam 1 / 1
Best valid mse: 56.75379
optimal ntree: 1053
Training time: 8.421956 sec.
Best for scenario: sim87
hyperparam 1 / 1
Best valid mse: 56.33146
optimal ntree: 406
Training time: 5.084105 sec.
Best for scenario: sim88
hyperparam 1 / 1
Best valid mse: 53.97764
optimal ntree: 571
Training time: 6.098414 sec.
Best for scenario: sim89
hyperparam 1 / 1
Best valid mse: 62.56728
optimal ntree: 356
Training time: 5.415524 sec.
Best for scenario: sim90
hyperparam 1 / 1
Best valid mse: 55.01317
optimal ntree: 317
Training time: 7.696967 sec.
Best for scenario: sim91
hyperparam 1 / 1
Best valid mse: 56.06621
optimal ntree: 351
Training time: 5.708203 sec.
Best for scenario: sim92
hyperparam 1 / 1
Best valid mse: 54.37888
optimal ntree: 441
Training time: 5.556663 sec.
Best for scenario: sim93
hyperparam 1 / 1
Best valid mse: 59.14325
optimal ntree: 908
Training time: 10.33407 sec.
Best for scenario: sim94
hyperparam 1 / 1
Best valid mse: 55.5352
optimal ntree: 277
Training time: 5.831233 sec.
Best for scenario: sim95
hyperparam 1 / 1
Best valid mse: 51.61785
optimal ntree: 429
Training time: 6.02435 sec.
Best for scenario: sim96
hyperparam 1 / 1
Best valid mse: 57.50831
optimal ntree: 490
Training time: 5.285491 sec.
Best for scenario: sim97
hyperparam 1 / 1
Best valid mse: 49.39495
optimal ntree: 424
Training time: 4.934937 sec.
Best for scenario: sim98
hyperparam 1 / 1
Best valid mse: 58.76595
optimal ntree: 319
Training time: 4.34882 sec.
Best for scenario: sim99
hyperparam 1 / 1
Best valid mse: 56.64473
optimal ntree: 431
Training time: 6.017951 sec.
Best for scenario: sim100
hyperparam 1 / 1
Best valid mse: 50.86109
optimal ntree: 344
Training time: 5.003828 sec.
2.2 Unaware premium
The unaware price, \(\mu^U(\mathbf{x})\), excludes direct use of \(D\). It is defined as the best predictor of \(\widehat{\mu}^B(\mathbf{X}, D)\) given \(\mathbf{x}\): \[\begin{equation*}
\widehat{\mu}^U(\mathbf{x}) = E\{\widehat{\mu}^B(\mathbf{x}, D) | X = \mathbf{x}\}.
\end{equation*}\]
This maintains consistency with the best estimate premium, whether \(\widehat{\mu}^B\) is data-driven or based on indicated rates (Complement 5 of the main paper). One can use a to predict \(\widehat{\mu}^B(\mathbf{X}, D)\) based on \(\mathbf{X}\). The loss function defaults to mean squared error as it is a reasonable distributional assumption for this response variable.
Alternatively (what we do), one can estimate the propensity and explicitly weight the best-estimate premiums based on predicted propensity. This also keeps consistency with the best-estimate.
Pdx for scenario: Scenario1
hyperparam 1 / 1
Best valid binary logloss: 0.5817981
optimal ntree: 360
Training time: 7.743604 sec.
Pdx for scenario: Scenario2
hyperparam 1 / 1
Best valid binary logloss: 0.5789044
optimal ntree: 405
Training time: 9.595778 sec.
Pdx for scenario: Scenario3
hyperparam 1 / 1
Best valid binary logloss: 0.6345684
optimal ntree: 492
Training time: 12.89847 sec.
Training the best-estimate price on experimental data (for later use in section 5)
Pdx for scenario: sim1
hyperparam 1 / 1
Best valid binary logloss: 0.6379993
optimal ntree: 116
Training time: 1.948016 sec.
Pdx for scenario: sim2
hyperparam 1 / 1
Best valid binary logloss: 0.604375
optimal ntree: 160
Training time: 2.083842 sec.
Pdx for scenario: sim3
hyperparam 1 / 1
Best valid binary logloss: 0.5575499
optimal ntree: 414
Training time: 3.736413 sec.
Pdx for scenario: sim4
hyperparam 1 / 1
Best valid binary logloss: 0.5971327
optimal ntree: 159
Training time: 2.640875 sec.
Pdx for scenario: sim5
hyperparam 1 / 1
Best valid binary logloss: 0.5904888
optimal ntree: 341
Training time: 4.062044 sec.
Pdx for scenario: sim6
hyperparam 1 / 1
Best valid binary logloss: 0.6250062
optimal ntree: 147
Training time: 2.587351 sec.
Pdx for scenario: sim7
hyperparam 1 / 1
Best valid binary logloss: 0.6080139
optimal ntree: 219
Training time: 2.808161 sec.
Pdx for scenario: sim8
hyperparam 1 / 1
Best valid binary logloss: 0.5644839
optimal ntree: 263
Training time: 3.051121 sec.
Pdx for scenario: sim9
hyperparam 1 / 1
Best valid binary logloss: 0.6110802
optimal ntree: 187
Training time: 2.883613 sec.
Pdx for scenario: sim10
hyperparam 1 / 1
Best valid binary logloss: 0.6345554
optimal ntree: 151
Training time: 2.635355 sec.
Pdx for scenario: sim11
hyperparam 1 / 1
Best valid binary logloss: 0.5686411
optimal ntree: 331
Training time: 3.734863 sec.
Pdx for scenario: sim12
hyperparam 1 / 1
Best valid binary logloss: 0.6159331
optimal ntree: 165
Training time: 2.735916 sec.
Pdx for scenario: sim13
hyperparam 1 / 1
Best valid binary logloss: 0.5849873
optimal ntree: 204
Training time: 3.091587 sec.
Pdx for scenario: sim14
hyperparam 1 / 1
Best valid binary logloss: 0.5851934
optimal ntree: 298
Training time: 3.419677 sec.
Pdx for scenario: sim15
hyperparam 1 / 1
Best valid binary logloss: 0.5825115
optimal ntree: 224
Training time: 3.208649 sec.
Pdx for scenario: sim16
hyperparam 1 / 1
Best valid binary logloss: 0.6126702
optimal ntree: 183
Training time: 3.123254 sec.
Pdx for scenario: sim17
hyperparam 1 / 1
Best valid binary logloss: 0.5983043
optimal ntree: 193
Training time: 5.23823 sec.
Pdx for scenario: sim18
hyperparam 1 / 1
Best valid binary logloss: 0.5970047
optimal ntree: 225
Training time: 3.991795 sec.
Pdx for scenario: sim19
hyperparam 1 / 1
Best valid binary logloss: 0.6382285
optimal ntree: 121
Training time: 2.463771 sec.
Pdx for scenario: sim20
hyperparam 1 / 1
Best valid binary logloss: 0.6025327
optimal ntree: 214
Training time: 3.472324 sec.
Pdx for scenario: sim21
hyperparam 1 / 1
Best valid binary logloss: 0.5378712
optimal ntree: 323
Training time: 3.619435 sec.
Pdx for scenario: sim22
hyperparam 1 / 1
Best valid binary logloss: 0.5684613
optimal ntree: 277
Training time: 3.598076 sec.
Pdx for scenario: sim23
hyperparam 1 / 1
Best valid binary logloss: 0.6079414
optimal ntree: 142
Training time: 2.256139 sec.
Pdx for scenario: sim24
hyperparam 1 / 1
Best valid binary logloss: 0.614944
optimal ntree: 157
Training time: 3.527215 sec.
Pdx for scenario: sim25
hyperparam 1 / 1
Best valid binary logloss: 0.6057661
optimal ntree: 229
Training time: 4.347793 sec.
Pdx for scenario: sim26
hyperparam 1 / 1
Best valid binary logloss: 0.5835863
optimal ntree: 217
Training time: 4.235283 sec.
Pdx for scenario: sim27
hyperparam 1 / 1
Best valid binary logloss: 0.5684134
optimal ntree: 326
Training time: 4.756191 sec.
Pdx for scenario: sim28
hyperparam 1 / 1
Best valid binary logloss: 0.6112137
optimal ntree: 235
Training time: 4.134582 sec.
Pdx for scenario: sim29
hyperparam 1 / 1
Best valid binary logloss: 0.5780127
optimal ntree: 317
Training time: 4.11689 sec.
Pdx for scenario: sim30
hyperparam 1 / 1
Best valid binary logloss: 0.6275452
optimal ntree: 131
Training time: 2.157132 sec.
Pdx for scenario: sim31
hyperparam 1 / 1
Best valid binary logloss: 0.5804082
optimal ntree: 181
Training time: 2.921597 sec.
Pdx for scenario: sim32
hyperparam 1 / 1
Best valid binary logloss: 0.5763533
optimal ntree: 351
Training time: 3.937903 sec.
Pdx for scenario: sim33
hyperparam 1 / 1
Best valid binary logloss: 0.5950109
optimal ntree: 204
Training time: 3.363517 sec.
Pdx for scenario: sim34
hyperparam 1 / 1
Best valid binary logloss: 0.5986753
optimal ntree: 192
Training time: 3.189388 sec.
Pdx for scenario: sim35
hyperparam 1 / 1
Best valid binary logloss: 0.5919205
optimal ntree: 630
Training time: 5.726014 sec.
Pdx for scenario: sim36
hyperparam 1 / 1
Best valid binary logloss: 0.6075012
optimal ntree: 171
Training time: 3.031379 sec.
Pdx for scenario: sim37
hyperparam 1 / 1
Best valid binary logloss: 0.6198464
optimal ntree: 154
Training time: 2.906674 sec.
Pdx for scenario: sim38
hyperparam 1 / 1
Best valid binary logloss: 0.6218275
optimal ntree: 145
Training time: 2.613921 sec.
Pdx for scenario: sim39
hyperparam 1 / 1
Best valid binary logloss: 0.5972373
optimal ntree: 217
Training time: 3.040666 sec.
Pdx for scenario: sim40
hyperparam 1 / 1
Best valid binary logloss: 0.6137383
optimal ntree: 128
Training time: 2.440228 sec.
Pdx for scenario: sim41
hyperparam 1 / 1
Best valid binary logloss: 0.5639046
optimal ntree: 388
Training time: 6.495343 sec.
Pdx for scenario: sim42
hyperparam 1 / 1
Best valid binary logloss: 0.6043419
optimal ntree: 188
Training time: 4.328391 sec.
Pdx for scenario: sim43
hyperparam 1 / 1
Best valid binary logloss: 0.5705812
optimal ntree: 237
Training time: 7.522187 sec.
Pdx for scenario: sim44
hyperparam 1 / 1
Best valid binary logloss: 0.6216577
optimal ntree: 146
Training time: 4.39523 sec.
Pdx for scenario: sim45
hyperparam 1 / 1
Best valid binary logloss: 0.6077637
optimal ntree: 316
Training time: 6.774113 sec.
Pdx for scenario: sim46
hyperparam 1 / 1
Best valid binary logloss: 0.5901497
optimal ntree: 247
Training time: 5.998063 sec.
Pdx for scenario: sim47
hyperparam 1 / 1
Best valid binary logloss: 0.5579147
optimal ntree: 307
Training time: 5.896523 sec.
Pdx for scenario: sim48
hyperparam 1 / 1
Best valid binary logloss: 0.5843539
optimal ntree: 343
Training time: 6.286552 sec.
Pdx for scenario: sim49
hyperparam 1 / 1
Best valid binary logloss: 0.614583
optimal ntree: 160
Training time: 3.865074 sec.
Pdx for scenario: sim50
hyperparam 1 / 1
Best valid binary logloss: 0.5732406
optimal ntree: 459
Training time: 8.22917 sec.
Pdx for scenario: sim51
hyperparam 1 / 1
Best valid binary logloss: 0.5708289
optimal ntree: 235
Training time: 5.010983 sec.
Pdx for scenario: sim52
hyperparam 1 / 1
Best valid binary logloss: 0.5913323
optimal ntree: 230
Training time: 3.850774 sec.
Pdx for scenario: sim53
hyperparam 1 / 1
Best valid binary logloss: 0.6106269
optimal ntree: 191
Training time: 3.069129 sec.
Pdx for scenario: sim54
hyperparam 1 / 1
Best valid binary logloss: 0.595427
optimal ntree: 208
Training time: 3.47573 sec.
Pdx for scenario: sim55
hyperparam 1 / 1
Best valid binary logloss: 0.588144
optimal ntree: 306
Training time: 5.16458 sec.
Pdx for scenario: sim56
hyperparam 1 / 1
Best valid binary logloss: 0.6320633
optimal ntree: 134
Training time: 2.535409 sec.
Pdx for scenario: sim57
hyperparam 1 / 1
Best valid binary logloss: 0.5685531
optimal ntree: 275
Training time: 3.459567 sec.
Pdx for scenario: sim58
hyperparam 1 / 1
Best valid binary logloss: 0.5728599
optimal ntree: 300
Training time: 4.048972 sec.
Pdx for scenario: sim59
hyperparam 1 / 1
Best valid binary logloss: 0.5752295
optimal ntree: 257
Training time: 4.667999 sec.
Pdx for scenario: sim60
hyperparam 1 / 1
Best valid binary logloss: 0.561215
optimal ntree: 371
Training time: 4.729726 sec.
Pdx for scenario: sim61
hyperparam 1 / 1
Best valid binary logloss: 0.5745739
optimal ntree: 212
Training time: 3.996554 sec.
Pdx for scenario: sim62
hyperparam 1 / 1
Best valid binary logloss: 0.5677081
optimal ntree: 382
Training time: 5.065061 sec.
Pdx for scenario: sim63
hyperparam 1 / 1
Best valid binary logloss: 0.5894894
optimal ntree: 211
Training time: 3.250722 sec.
Pdx for scenario: sim64
hyperparam 1 / 1
Best valid binary logloss: 0.5963445
optimal ntree: 349
Training time: 4.712366 sec.
Pdx for scenario: sim65
hyperparam 1 / 1
Best valid binary logloss: 0.646684
optimal ntree: 116
Training time: 2.191305 sec.
Pdx for scenario: sim66
hyperparam 1 / 1
Best valid binary logloss: 0.6212037
optimal ntree: 126
Training time: 2.160214 sec.
Pdx for scenario: sim67
hyperparam 1 / 1
Best valid binary logloss: 0.6023696
optimal ntree: 202
Training time: 3.273964 sec.
Pdx for scenario: sim68
hyperparam 1 / 1
Best valid binary logloss: 0.5888413
optimal ntree: 310
Training time: 3.801854 sec.
Pdx for scenario: sim69
hyperparam 1 / 1
Best valid binary logloss: 0.6065003
optimal ntree: 155
Training time: 2.364828 sec.
Pdx for scenario: sim70
hyperparam 1 / 1
Best valid binary logloss: 0.6173803
optimal ntree: 149
Training time: 2.553649 sec.
Pdx for scenario: sim71
hyperparam 1 / 1
Best valid binary logloss: 0.5845285
optimal ntree: 327
Training time: 3.706515 sec.
Pdx for scenario: sim72
hyperparam 1 / 1
Best valid binary logloss: 0.5860112
optimal ntree: 217
Training time: 2.915491 sec.
Pdx for scenario: sim73
hyperparam 1 / 1
Best valid binary logloss: 0.5910884
optimal ntree: 259
Training time: 3.716627 sec.
Pdx for scenario: sim74
hyperparam 1 / 1
Best valid binary logloss: 0.5811708
optimal ntree: 331
Training time: 4.306144 sec.
Pdx for scenario: sim75
hyperparam 1 / 1
Best valid binary logloss: 0.5682469
optimal ntree: 407
Training time: 5.752871 sec.
Pdx for scenario: sim76
hyperparam 1 / 1
Best valid binary logloss: 0.6063615
optimal ntree: 200
Training time: 6.784282 sec.
Pdx for scenario: sim77
hyperparam 1 / 1
Best valid binary logloss: 0.5836817
optimal ntree: 340
Training time: 9.837525 sec.
Pdx for scenario: sim78
hyperparam 1 / 1
Best valid binary logloss: 0.6177518
optimal ntree: 156
Training time: 6.528549 sec.
Pdx for scenario: sim79
hyperparam 1 / 1
Best valid binary logloss: 0.6146692
optimal ntree: 134
Training time: 6.647015 sec.
Pdx for scenario: sim80
hyperparam 1 / 1
Best valid binary logloss: 0.6070313
optimal ntree: 228
Training time: 8.042032 sec.
Pdx for scenario: sim81
hyperparam 1 / 1
Best valid binary logloss: 0.5967104
optimal ntree: 216
Training time: 7.600263 sec.
Pdx for scenario: sim82
hyperparam 1 / 1
Best valid binary logloss: 0.602811
optimal ntree: 212
Training time: 6.915087 sec.
Pdx for scenario: sim83
hyperparam 1 / 1
Best valid binary logloss: 0.5677336
optimal ntree: 214
Training time: 7.670902 sec.
Pdx for scenario: sim84
hyperparam 1 / 1
Best valid binary logloss: 0.6145164
optimal ntree: 187
Training time: 6.004043 sec.
Pdx for scenario: sim85
hyperparam 1 / 1
Best valid binary logloss: 0.5937969
optimal ntree: 230
Training time: 6.345938 sec.
Pdx for scenario: sim86
hyperparam 1 / 1
Best valid binary logloss: 0.6027287
optimal ntree: 276
Training time: 7.571508 sec.
Pdx for scenario: sim87
hyperparam 1 / 1
Best valid binary logloss: 0.5771001
optimal ntree: 200
Training time: 7.213918 sec.
Pdx for scenario: sim88
hyperparam 1 / 1
Best valid binary logloss: 0.6156868
optimal ntree: 146
Training time: 6.566042 sec.
Pdx for scenario: sim89
hyperparam 1 / 1
Best valid binary logloss: 0.6262274
optimal ntree: 162
Training time: 5.564465 sec.
Pdx for scenario: sim90
hyperparam 1 / 1
Best valid binary logloss: 0.5610517
optimal ntree: 251
Training time: 7.157968 sec.
Pdx for scenario: sim91
hyperparam 1 / 1
Best valid binary logloss: 0.5692756
optimal ntree: 362
Training time: 7.883253 sec.
Pdx for scenario: sim92
hyperparam 1 / 1
Best valid binary logloss: 0.5620006
optimal ntree: 492
Training time: 10.35524 sec.
Pdx for scenario: sim93
hyperparam 1 / 1
Best valid binary logloss: 0.5907637
optimal ntree: 178
Training time: 6.044312 sec.
Pdx for scenario: sim94
hyperparam 1 / 1
Best valid binary logloss: 0.5977362
optimal ntree: 209
Training time: 7.359194 sec.
Pdx for scenario: sim95
hyperparam 1 / 1
Best valid binary logloss: 0.6226825
optimal ntree: 224
Training time: 7.610741 sec.
Pdx for scenario: sim96
hyperparam 1 / 1
Best valid binary logloss: 0.59383
optimal ntree: 202
Training time: 9.454721 sec.
Pdx for scenario: sim97
hyperparam 1 / 1
Best valid binary logloss: 0.5356933
optimal ntree: 420
Training time: 10.42067 sec.
Pdx for scenario: sim98
hyperparam 1 / 1
Best valid binary logloss: 0.5934672
optimal ntree: 250
Training time: 10.3707 sec.
Pdx for scenario: sim99
hyperparam 1 / 1
Best valid binary logloss: 0.5891839
optimal ntree: 163
Training time: 7.269016 sec.
Pdx for scenario: sim100
hyperparam 1 / 1
Best valid binary logloss: 0.6119387
optimal ntree: 180
Training time: 3.10501 sec.
2.3 Aware premium
The aware premium, \(\widehat{\mu}^A(\mathbf{x})\), requires knowledge of the marginal distribution of \(D\). The aware price, a particular case of the ``discrimination-free’’ price of Lindholm et al. (2022), is computed as: \[\begin{equation*}
\widehat{\mu}^A(\mathbf{x}) = \sum_{d\in \mathcal{D}} \widehat{\mu}^B(\mathbf{x}, d) \widehat{\Pr}(D = d).
\end{equation*}\] As discussed earlier, we assume \(D\) is discrete or discretized. If the training sample is representative of the target population (portfolio, market, or region?), empirical proportions are consistent estimators of \(\widehat{\Pr}(D = d)\). This is also an estimator suggested in Section 5 of Lindholm et al. (2022).
Computing the empirical proportions for protected supopulations
marg_dist <- sims %>%lapply(function(data){ data$D %>% table %>% prop.table})## for later usesaveRDS(marg_dist, 'preds/the_empirical_proportions.rds')## Computing the empirical proportions for protected supopulations on experimental data (for later use in section 5)marg_dist_sims <-seq_along(sim_samples) %>%lapply(function(idx){ sim_samples[[idx]]$train$D %>% table %>% prop.table})
Note
This method generalizes the use of \(D\) as a control variable to prevent distortions in estimated effect of \(\mathbf{X}\) due to omitted variable bias. It applies to all model types and has a causal interpretation under the assumptions that: (i) possible values of \(\mathbf{X}\) are observed across all groups of \(D\) (no extrapolation), (ii) \(D\) causes both \(\mathbf{X}\) and \(Y\), and (iii) there are no relevant unobserved variables. Under the same assumptions, other methods are estimators of the aware premium. One example is inverse probability weighting, which re-weights observations to (artificially) remove the association between \(\mathbf{X}\) and \(D\), preventing proxy effects.
2.4 Corrective premium
We estimate the corrective premium, \(\widehat{\mu}^C(\mathbf{x}, d)\), designed to enforce strong demographic parity by aligning premium distributions across protected groups. Various methods can achieve this, but we recommend the optimal transport approach of Hu, Ratz, and Charpentier (2024). Its advantage lies in modifying \(\widehat{\mu^B}(\mathbf{x}, d)\) while keeping the overall spectrum estimation anchored to the initial best-estimate premium. As an incremental adjustment, it minimizes modeling complexity while enforcing demographic parity, yielding a simple estimator for the corrective premium.
The corrective premium is computed by training a transport function that shifts best-estimate premiums for each protected group toward a common barycenter, ensuring demographic parity through corrective direct discrimination. See Hu, Ratz, and Charpentier (2024) for details. This optimal transport method is implemented in Python via the Equipy package of Fernandes Machado et al. (2025). Listing lst-fair_wasserstein illustrates how this method, despite its complexity, translates into concise Python code.
Listing 2.1: Illustrative Python code for wasserstein transport toward corrective premiums.
import numpy as npimport pandas as pdfrom equipy.fairness import FairWasserstein# Load best-estimate premiums and associated sensitive attributesbest_est_train = pd.read_csv('best_est_prem_train.csv') # Training premiumssens_train = pd.read_csv('sensitive_train.csv') # discrete sensitive data# Train Fair Wasserstein transport to adjust premiumsbarycenter = FairWasserstein()barycenter.fit(best_est.values, sens.values)# Load best-estimate premiums and associated sensitive attributesbest_est_test = pd.read_csv('best_est_prem_test.csv') # Training premiumssens_test = pd.read_csv('sensitive_test.csv') # discrete sensitive data# Apply transformation to obtain fairness-adjusted premiumscorrective_premiums_test = barycenter.transform(best_est_test.values, sens_test.values)# Save resultspd.DataFrame(corrective_premiums).to_csv('corr_prem_test.csv', index=False)
Optimal transport and wasserstein distance : technical details
ICI ARTHUR?
Computing the optimal transport mapping for the scenarios
source_python("___opt_transp.py")## now, the folder 'transported' exist, with one epsilon_y file per scenario
To predict corrective premium on other datasets, one can alternatively train a lightgbm model of mapping from \((X, D)\) to the resulting corrective premiums from Equipy on the training sample.
Computing the optimal transport mapping for the scenarios
Mapping to corr. for scenario: Scenario1
Data import: 0.3252981 sec.
Data conversion: 0.01066899 sec.
Lgb training : 17.72908 sec.
Mapping to corr. for scenario: Scenario2
Data import: 0.296618 sec.
Data conversion: 0.01664019 sec.
Lgb training : 8.731765 sec.
Mapping to corr. for scenario: Scenario3
Data import: 0.2445309 sec.
Data conversion: 0.02969599 sec.
Lgb training : 21.20496 sec.
2.5 Hyperaware premium
The hyperaware premium, \(\widehat{\mu}^H(\mathbf{x})\), is the best non-directly discriminatory approximation of the corrective premium. We construct a lightgbm using only \(\mathbf{X}\) to predict the corrective premium, aiming at demographic parity without direct reliance on \(D\): \[\begin{equation*}
\widehat{\mu}^H(\mathbf{x}) = E\{\widehat{\mu}^C(\mathbf{x}, D) | X = \mathbf{x}\}.
\end{equation*}\]
Since the response variable is a premium, MSE is a correct choice of loss function in most cases. Just as for the unaware premium, one can explicitly aggregate corrective premiums across protected groups using estimated propensities as weights. This is what we do here.
2.6 Visualization the estimated spectrum
By combining all the trained models as component into a trained spectrum of fairness, we can define the premium function that will serve to predict the premiums
Defining the estimated premium and metrics functions
premiums <-setNames(nm =names(sims)) %>%lapply(function(name){premium_generator(best = best_lgb[[name]]$pred_fun, pdx = pdx_lgb[[name]]$pred_fun, maps_to_corr = corr_mapping_lgb[[name]], marg = marg_dist[[name]])})quants <-setNames(nm =names(sims)) %>%lapply(function(name){quant_generator(premiums = premiums[[name]])})## For experimental data (future use in section 5)premiums_sims <-setNames(obj =seq_along(sim_samples),nm =names(sim_samples)) %>%lapply(function(idx){premium_generator(best = best_sims[[idx]]$pred_fun, pdx = pdx_sims[[idx]]$pred_fun, maps_to_corr = corr_mapping_lgb[['Scenario1']], ## We put anything, won't be used anyway as we focus on proxy vulnerabiltiy for section 5. marg = marg_dist_sims[[idx]])})quants_sims <-setNames(obj =seq_along(sim_samples),nm =names(sim_samples)) %>%lapply(function(idx){quant_generator(premiums = premiums_sims[[idx]])})
Computation of estimated premiums and local metrics across the grid
df_to_g_file <-"preds/df_to_g.json"# Check if the JSON file existsif (file.exists(df_to_g_file)) {message(sprintf("[%s] File exists. Reading df_to_g from %s", Sys.time(), df_to_g_file)) df_to_g <-fromJSON(df_to_g_file)} else {## From the first section of the online supplementpreds_grid_stats_theo <-fromJSON('preds/preds_grid_stats_theo.json')df_to_g <-setNames(nm =names(sims)) %>%lapply(function(name) {message(sprintf("[%s] Processing: %s", Sys.time(), name)) local_scenario_df <- preds_grid_stats_theo[[name]]# Start time for this scenario start_time <-Sys.time()# Step 1: Compute premiumsmessage(sprintf("[%s] Step 1: Computing premiums", Sys.time())) premium_results <-setNames(nm = levels_for_premiums) %>%sapply(function(s) {message(sprintf("[%s] Computing premium: %s", Sys.time(), s)) premiums[[name]][[s]](x1 = local_scenario_df$x1,x2 = local_scenario_df$x2,d = local_scenario_df$d ) })# Step 2: Compute PDXmessage(sprintf("[%s] Step 2: Computing PDX", Sys.time())) pdx_results <- pdx_lgb[[name]]$pred_fun(data.frame(X1 = local_scenario_df$x1,X2 = local_scenario_df$x2,D = local_scenario_df$d ) )# Combine resultsmessage(sprintf("[%s] Step 5: Combining results", Sys.time())) result <-data.frame( local_scenario_df, premium_results,pdx = pdx_results )# Log completion time end_time <-Sys.time()message(sprintf("[%s] Finished processing: %s (Duration: %.2f seconds)", end_time, name, as.numeric(difftime(end_time, start_time, units ="secs"))))return(result)})# Save the entire df_to_g object to JSONmessage(sprintf("[%s] Saving df_to_g to %s", Sys.time(), df_to_g_file))toJSON(df_to_g, pretty =TRUE, auto_unbox =TRUE) %>%write(df_to_g_file)rm(df_to_g_theo)}grid_stats_path <-'preds/preds_grid_stats.json'# Check and load or compute preds_grid_statsif (file.exists(grid_stats_path)) { preds_grid_stats <-fromJSON(grid_stats_path)} else { preds_grid_stats <-setNames(nm =names(df_to_g)) %>%lapply(function(name) {data.frame(df_to_g[[name]], setNames(nm = levels_for_quants) %>%sapply(function(s) { quants[[name]][[s]](x1 = df_to_g[[name]]$x1,x2 = df_to_g[[name]]$x2,d = df_to_g[[name]]$d) })) })toJSON(preds_grid_stats, pretty =TRUE, auto_unbox =TRUE) %>%write(grid_stats_path)}
Computing estimated premiums and local metrics for the simulations
pop_to_g_file <-"preds/pop_to_g.json"# Check if the JSON file existsif (file.exists(pop_to_g_file)) {message(sprintf("[%s] File exists. Reading pop_to_g from %s", Sys.time(), pop_to_g_file)) pop_to_g <-fromJSON(pop_to_g_file)} else {## From the first section of the online supplementpreds_pop_stats_theo <-fromJSON('preds/preds_pop_stats_theo.json')pop_to_g <-setNames(nm =names(sims)) %>%lapply(function(name) {message(sprintf("[%s] Processing: %s", Sys.time(), name))# Start time for this simulation start_time <-Sys.time() list_data <-list('train'= sims[[name]],'valid'= valid[[name]],'test'= test[[name]]) result <-setNames(nm =names(list_data)) %>%lapply(function(nm){ data <- list_data[[nm]]# Step 1: Compute premiumsmessage(sprintf("[%s] Step 1: Computing premiums", Sys.time())) premium_results <-setNames(nm = levels_for_premiums) %>%sapply(function(s) {message(sprintf("[%s] Computing premium: %s", Sys.time(), s)) premiums[[name]][[s]](x1 = data$X1,x2 = data$X2,d = data$D ) })# Step 2: Compute PDXmessage(sprintf("[%s] Step 2: Computing PDX", Sys.time())) pdx_results <- pdx_lgb[[name]]$pred_fun(data.frame(X1 = data$X1,X2 = data$X2,D = data$D ) )# Combine resultsmessage(sprintf("[%s] Step 5: Combining results", Sys.time()))data.frame( preds_pop_stats_theo[[name]][[nm]], premium_results,pdx = pdx_results ) }) # Log completion time end_time <-Sys.time()message(sprintf("[%s] Finished processing: %s (Duration: %.2f seconds)", end_time, name, as.numeric(difftime(end_time, start_time, units ="secs"))))return(result)})# Save the entire df_to_g object to JSONmessage(sprintf("[%s] Saving pop_to_g to %s", Sys.time(), pop_to_g_file))toJSON(pop_to_g, pretty =TRUE, auto_unbox =TRUE) %>%write(pop_to_g_file)rm(pop_to_g_theo)}pop_stats_path <-'preds/preds_pop_stats.json'# Check and load or compute preds_pop_statsif (file.exists(pop_stats_path)) { preds_pop_stats <-fromJSON(pop_stats_path)} else { preds_pop_stats <-setNames(nm =names(sims)) %>%lapply(function(name) {setNames(nm =names(pop_to_g[[name]])) %>%lapply(function(set) { local_df <- pop_to_g[[name]][[set]]data.frame(local_df, setNames(nm = levels_for_quants) %>%sapply(function(s) { quants[[name]][[s]](x1 = local_df$X1,x2 = local_df$X2,d = local_df$D) })) }) })toJSON(preds_pop_stats, pretty =TRUE, auto_unbox =TRUE) %>%write(pop_stats_path)}
Computing estimated premiums and local metrics for the experiment
Figure 2.2: Estimated best-estimate \(\widehat{\mu}^B\), unaware \(\widehat{\mu}^U\), and aware \(\widehat{\mu}^A\) premiums for scenarios 1, 2, and 3 in the Example as a function of \(x_1\), \(x_2\), and \(d\).
R code producing the estimated aware, hyperaware, corrective illustration.
Figure 2.3: Estimated aware \(\widehat{\mu}^A\), hyperaware \(\widehat{\mu}^H\), and corrective \(\widehat{\mu}^C\) premiums for scenarios 1, 2, and 3 in the Example as a function of \(x_1\), \(x_2\), and \(d\).
Chevalier, Dominik, and Marie-Pier Côté. 2024. “From Point to Probabilistic Gradient Boosting for Claim Frequency and Severity Prediction.”arXiv PreprintarXiv:2412.14916. https://arxiv.org/abs/2412.14916.
Fernandes Machado, Agathe, Suzie Grondin, Philipp Ratz, Arthur Charpentier, and François Hu. 2025. “EquiPy: Sequential Fairness Using Optimal Transport in Python.”arXiv PreprintarXiv:2503.09866. https://arxiv.org/abs/2503.09866.
Hu, Francois, Philipp Ratz, and Arthur Charpentier. 2024. “A Sequentially Fair Mechanism for Multiple Sensitive Attributes.”Proceedings of the AAAI Conference on Artificial Intelligence 38 (11): 12502–10. https://doi.org/10.1609/aaai.v38i11.29143.
Ke, Guolin, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. “LightGBM: A Highly Efficient Gradient Boosting Decision Tree.”Advances in Neural Information Processing Systems 30: 3146–54.
Lindholm, Mathias, Ronald Richman, Andreas Tsanakas, and Mario V Wüthrich. 2022. “Discrimination-Free Insurance Pricing.”ASTIN Bulletin 52 (1): 55–89.